Assert a model
Decide on a test statistic
Construct the sampling distribution
See where your observed stat lies in that distribution
\[N_{pairs} = 9; \quad N_{singles} = 5\]
We'll use simulation.
Create the population of socks:
sock_pairs <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K")
sock_singles <- c("l", "m", "n", "o", "p")
socks <- c(rep(sock_pairs, each = 2), sock_singles)
socks
## [1] "A" "A" "B" "B" "C" "C" "D" "D" "E" "E" "F" "F" "G" "G" "H" "H" "I" "I" "J" ## [20] "J" "K" "K" "l" "m" "n" "o" "p"
picked_socks <- sample(socks, size = 11, replace = FALSE); picked_socks
## [1] "I" "F" "m" "K" "D" "A" "n" "C" "E" "G" "E"
sock_counts <- table(picked_socks); sock_counts
## picked_socks ## A C D E F G I K m n ## 1 1 1 2 1 1 1 1 1 1
n_singles <- sum(sock_counts == 1); n_singles
## [1] 9
pick_socks(N_pairs = 9, N_singles = 5, N_pick = 11)
## [1] 9
pick_socks(9, 5, 11)
## [1] 7
pick_socks(9, 5, 11)
## [1] 7
Repeat many, many times…
Quantifying how far into the tails our observed count was.
table(sim_singles)
## sim_singles ## 1 3 5 7 9 11 ## 2 48 248 411 250 41
table(sim_singles)[6]/1000
## 11 ## 0.041
Our two-tailed p-value is 0.082.
The result of a hypothesis test is a probability of the form:
\[ \mathbb{P}(\textrm{ data or more extreme } | \ H_0 \textrm{ true }) \]
while most people think they're getting
\[ \mathbb{P}(\ H_0 \textrm{ true } | \textrm{ data or more extreme}) \]
How can we go from the former to the latter?
\[\mathbb{P}(A \ | \ B) = \frac{\mathbb{P}(A \textrm{ and } B)}{\mathbb{P}(B)} \]
\[\mathbb{P}(A \ | \ B) = \frac{\mathbb{P}(B \ | \ A) \ \mathbb{P}(A)}{\mathbb{P}(B)} \]
\[\mathbb{P}(model \ | \ data \, or \, more \, extreme) = \frac{\mathbb{P}(data \, or \, more \, extreme\ | \ model) \ \mathbb{P}(model)}{\mathbb{P}(data \, or \, more \, extreme)} \]
What does it mean to think about \(\mathbb{P}(model)\)?
A prior distribution is a probability distribution for a parameter that summarizes the information that you have before seeing the data.
head(sock_sim, 3)
## unique pairs n_socks prop_pairs ## 1 3 4 16 0.970 ## 2 7 2 33 0.914 ## 3 9 1 51 0.929
sock_sim %>% filter(unique == 11, pairs == 0) %>% head(3)
## unique pairs n_socks prop_pairs ## 1 11 0 49 0.692 ## 2 11 0 37 0.873 ## 3 11 0 49 0.815
\[ 21 \times 2 + 3 = 45 \textrm{ socks} \]
Bayesian methods . . .